home *** CD-ROM | disk | FTP | other *** search
- Xref: bloom-picayune.mit.edu comp.infosystems.wais:526 news.answers:3347
- Path: bloom-picayune.mit.edu!snorkelwacker.mit.edu!eff!sol.ctr.columbia.edu!spool.mu.edu!nigel.msen.com!nigel.msen.com!not-for-mail
- From: emv@msen.com (Edward Vielmetti)
- Newsgroups: comp.infosystems.wais,news.answers
- Subject: WAIS FAQ part 5 of n: Building a WAIS server
- Date: 5 Oct 1992 15:52:02 -0400
- Organization: Msen, Inc. -- Ann Arbor, Michigan
- Lines: 76
- Approved: emv@msen.com (Edward Vielmetti)
- Message-ID: <1aq6dgINN2tr@nigel.msen.com>
- NNTP-Posting-Host: nigel.msen.com
-
- Archive-name: wais-faq/server-basics
-
- This is a first pass at a "frequently asked questions" series for WAIS.
-
- Part 5 of this FAQ is an overview of the steps you need to take to
- build a WAIS server of your own. (Parts 1-4 and 5-n are not yet
- written, but are in progress, albeit slowly).
-
- The basic set of steps is:
-
- Select the data you want to serve. This may be as simple as "all
- of the mail in my inbox folder" or as complicated as "all of the
- really *good* articles posted to the net in the last year". You may
- need to do some OCR'ing or some typing to get this step taken care of.
-
- Ensure that you can keep an up to date copy of it on your site.
- If you are the original producer of the information this may be easy;
- if it's stored on a remote ftp site then alex or mirror or ftpget can
- keep it in sync; or if it's broadcast out as netnews the netnews
- CD-ROMs or "rkive" will do the trick.
-
- Munge it into a format that the WAIS indexer will understand, or
- write code that will do the indexing on the format you have. It's
- relatively straightforward to index things one file, mail message,
- news article, paragraph, line, or dash-separated piece at a time.
- There is a weak spot in the documentation as to what formats are
- supported right out of the box; if your data is complicated this might
- be a fair amount of work to get "right".
-
- Index the data with "waisindex". Be sure to note the "-mem" option if
- you have a small-ish machine, the "-stdin" option if you have a lot
- of files scattered all over the place, and so on.
-
- Buy some more disk drives, you will need them.
-
- Test the indexes you have to see that they answer the questions you want
- to answer. If you get rotten results you might have rotten data, or
- out of date or incomplete data, or files that are broken down into bits
- that are too big or too small, or too much redundant text so that
- queries are hard to pick out differences in small details. Go back
- to the "munge" step or even the "select" step if all is not well here.
-
- Edit the resulting ".src" file you get so that it includes the proper
- name of your system, a nice wordy description of what all people can
- expect to find in the database, and some examples of good questions.
- These are all finder's aids which will help your users use your database.
- Make a note of where you got the original data if that is not apparent.
-
- Arrange for a "waisserver" daemon to be started up out of your
- /etc/rc.local file so that the index is available all of the time.
- Alternatively, add an entry to /etc/inetd.conf and to /etc/services
- so that you can bring up WAIS out of inetd. Take note of the -e
- option so that you can put log files in a safe place.
-
- Search the wais directory of servers to make sure no one else is doing the
- exact same thing, or if they are get in touch with them to collaborate.
-
- Send the .src file into "wais-directory-of-servers@think.com" so that it
- can be included in the master directory. Post an announcement to this
- newsgroup so people can quiz you about it or so that they know about
- new stuff.
-
- Trim the log files that WAIS generates so that you can avoid filling up the
- disk that you just bought and so that you can see what it is that
- people are asking of your servers. Remember that there are privacy
- considerations involved.
-
- I think this just about does it. There ends up being a fair amount of
- other stuff you might find useful to know in the course of bringing up
- a server - certainly a working knowlege of news servers, perl, make, cron,
- C, yacc or lex, and shell scripts would not hurt in the slightest.
- It could be made easier to do I'm sure, though I suspect that building
- a good index is still art and not yet science.
-
- Edward Vielmetti, vice president for research, MSEN Inc. emv@msen.com
- MSEN Inc., 628 Brooks, Ann Arbor MI 48103 +1 313 998 4562
-